Skip to content

Fix/realtime tts voice rewire#181

Merged
rmittal-github merged 10 commits into
nvidia-riva:mainfrom
atomer-nvidia:fix/realtime-tts-voice-rewire
May 26, 2026
Merged

Fix/realtime tts voice rewire#181
rmittal-github merged 10 commits into
nvidia-riva:mainfrom
atomer-nvidia:fix/realtime-tts-voice-rewire

Conversation

@atomer-nvidia
Copy link
Copy Markdown
Contributor

No description provided.

atomer-nvidia and others added 5 commits May 14, 2026 11:11
Defer the pyaudio import to the points where it is actually needed
(MicrophoneStream.__enter__, SoundCallBack.__init__, list_*_devices,
get_*_info). Default WAV-output flows now work on machines without
PortAudio headers installed. When pyaudio is missing, raise an
ImportError that explicitly tells the user to install portaudio19-dev
first, addressing the VDR finding that fresh-box users got blocked by
a bare ModuleNotFoundError with no install instructions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The riva-asr/nmt/tts client scripts historically exit 0 on most error
paths — including "Unavailable model", connection refused, empty/invalid
input, and missing files — which causes CI pipelines composing these
scripts via && chains to silently swallow real failures.

Add a cli_main decorator that translates uncaught exceptions into a
small, consistent set of exit codes:

  2 = bad input (missing/empty file, ValueError, IsADirectoryError)
  3 = gRPC UNAVAILABLE (server down, wrong port, network)
  4 = gRPC INVALID_ARGUMENT / NOT_FOUND (bad model/lang/voice)
  1 = anything else
  130 = SIGINT

The decorator also writes the error to stderr so CI logs surface the
cause rather than the script swallowing it. Follow-up commit wires
this into each client script.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ation

Address the VDR 26.02 finding that python-clients CLIs exit 0 on most
error paths across all three modalities. Each script now:

  - Wraps main() with @cli_main so gRPC and OS errors propagate to a
    real exit code instead of being printed and swallowed.
  - Calls sys.exit(main()) so the chosen exit code reaches the shell.

Script-specific fixes:

  scripts/nmt/nmt.py
    - Drop the inner request() try/except that swallowed every gRPC
      status; let cli_main translate it. Empty/whitespace --text and
      missing --text-file now return EXIT_BAD_INPUT (was: silent
      exit 0). Document --max-len-variation as decoder-token units
      with valid range [0, 256], default 20, and Arabic chunking note.

  scripts/tts/talk.py
    - Reject whitespace-only --text up front (defense-in-depth pair to
      the server-side fix in riva-speech that closed the hang on
      `--text "   "`). Drop the broad `except Exception` that
      stringified gRPC errors and exited 0.

  scripts/asr/transcribe_file*.py
    - Replace `print(...); return` on missing input files with
      EXIT_BAD_INPUT. Remove the silent grpc.RpcError swallow in
      transcribe_file_offline.py.

  scripts/asr/transcribe_mic.py + realtime_asr_client.py + tts/talk.py
    - Pyaudio install hint now mentions `apt-get install -y
      portaudio19-dev` (Debian/Ubuntu) and `brew install portaudio`
      (macOS), pairing with the prereqs doc landed in documentation_2.

  scripts/tts/realtime_tts_client.py
    - Drop the module-level `from riva.client.audio_io import
      SoundCallBack` import (it was unused and pulled pyaudio in
      eagerly, defeating the lazy import). Drop the broad
      `except Exception` that mapped every failure to exit 1.

  scripts/nmt/nmt_speech_to_{text,speech}.py
    - Drop unused `import grpc`; remove the catch-all that printed
      "Error during translation" and exited 0.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
VDR 26.02 found that realtime_tts_client.py silently ignored --voice and
fell back to the server default (Mia). Tracing the WebSocket flow, the
synthesize_session.update payload was built by deep-mutating the response
from POST /v1/realtime/synthesis_sessions — an InitialSynthesisSessionConfig
that carries id/object/client_secret fields not present in
BaseSynthesisSessionConfig (the type the server validates the update
against). Carrying those keys through to the override, plus the shallow
.copy() + _safe_update_config nested-dict mutation, was the path that let
the voice_name override fail to land on published 26.02 NIMs.

Build the update payload explicitly from CLI args instead, so only fields
the user actually overrode reach the server, in the exact shape documented
in the SynthesisSessionUpdateMessage schema. Bump the override summary to
INFO so users can see which fields were sent. After the
synthesize_session.updated response, compare the server-applied voice_name
and language_code against what was requested and log a WARNING on
mismatch — defense-in-depth so any future server-side drop surfaces in the
client log instead of as a wrong-sounding audio file.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Only import parse_custom_configuration and pass custom_configuration to
synthesize/synthesize_online when --custom-configuration is supplied,
so talk.py keeps working against older riva-client wheels that lack
the function and the kwarg.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
atomer-nvidia and others added 4 commits May 18, 2026 19:08
cli_main and EXIT_BAD_INPUT were added recently in argparse_utils and
are not present in older riva-client wheels. Wrap their imports in a
try/except across all asr/nmt/tts client scripts, falling back to a
no-op decorator and EXIT_BAD_INPUT=2 so the scripts keep running
against older installed wheels (only the structured exit codes are
lost in that case).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
fix: nmt: surface EXIT_BAD_INPUT when --text-file has no non-empty lines
Aligns scripts/tts/talk.py and riva.client.SpeechSynthesisService
synthesize/synthesize_online defaults with the HTTP /v1/audio/synthesize
default, so the same call over either transport yields the same audio
when the rate is left unspecified.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread scripts/asr/transcribe_file_offline.py
Comment thread riva/client/realtime.py Outdated
Comment thread riva/client/realtime.py Outdated
@rmittal-github rmittal-github merged commit 20f1a48 into nvidia-riva:main May 26, 2026
rmittal-github pushed a commit that referenced this pull request May 26, 2026
* Make pyaudio an optional dependency in audio_io

Defer the pyaudio import to the points where it is actually needed
(MicrophoneStream.__enter__, SoundCallBack.__init__, list_*_devices,
get_*_info). Default WAV-output flows now work on machines without
PortAudio headers installed. When pyaudio is missing, raise an
ImportError that explicitly tells the user to install portaudio19-dev
first, addressing the VDR finding that fresh-box users got blocked by
a bare ModuleNotFoundError with no install instructions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Add cli_main decorator with structured CLI exit codes

The riva-asr/nmt/tts client scripts historically exit 0 on most error
paths — including "Unavailable model", connection refused, empty/invalid
input, and missing files — which causes CI pipelines composing these
scripts via && chains to silently swallow real failures.

Add a cli_main decorator that translates uncaught exceptions into a
small, consistent set of exit codes:

  2 = bad input (missing/empty file, ValueError, IsADirectoryError)
  3 = gRPC UNAVAILABLE (server down, wrong port, network)
  4 = gRPC INVALID_ARGUMENT / NOT_FOUND (bad model/lang/voice)
  1 = anything else
  130 = SIGINT

The decorator also writes the error to stderr so CI logs surface the
cause rather than the script swallowing it. Follow-up commit wires
this into each client script.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Wire cli_main into asr/nmt/tts client scripts and tighten input validation

Address the VDR 26.02 finding that python-clients CLIs exit 0 on most
error paths across all three modalities. Each script now:

  - Wraps main() with @cli_main so gRPC and OS errors propagate to a
    real exit code instead of being printed and swallowed.
  - Calls sys.exit(main()) so the chosen exit code reaches the shell.

Script-specific fixes:

  scripts/nmt/nmt.py
    - Drop the inner request() try/except that swallowed every gRPC
      status; let cli_main translate it. Empty/whitespace --text and
      missing --text-file now return EXIT_BAD_INPUT (was: silent
      exit 0). Document --max-len-variation as decoder-token units
      with valid range [0, 256], default 20, and Arabic chunking note.

  scripts/tts/talk.py
    - Reject whitespace-only --text up front (defense-in-depth pair to
      the server-side fix in riva-speech that closed the hang on
      `--text "   "`). Drop the broad `except Exception` that
      stringified gRPC errors and exited 0.

  scripts/asr/transcribe_file*.py
    - Replace `print(...); return` on missing input files with
      EXIT_BAD_INPUT. Remove the silent grpc.RpcError swallow in
      transcribe_file_offline.py.

  scripts/asr/transcribe_mic.py + realtime_asr_client.py + tts/talk.py
    - Pyaudio install hint now mentions `apt-get install -y
      portaudio19-dev` (Debian/Ubuntu) and `brew install portaudio`
      (macOS), pairing with the prereqs doc landed in documentation_2.

  scripts/tts/realtime_tts_client.py
    - Drop the module-level `from riva.client.audio_io import
      SoundCallBack` import (it was unused and pulled pyaudio in
      eagerly, defeating the lazy import). Drop the broad
      `except Exception` that mapped every failure to exit 1.

  scripts/nmt/nmt_speech_to_{text,speech}.py
    - Drop unused `import grpc`; remove the catch-all that printed
      "Error during translation" and exited 0.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Send override-only payload for realtime TTS session update

VDR 26.02 found that realtime_tts_client.py silently ignored --voice and
fell back to the server default (Mia). Tracing the WebSocket flow, the
synthesize_session.update payload was built by deep-mutating the response
from POST /v1/realtime/synthesis_sessions — an InitialSynthesisSessionConfig
that carries id/object/client_secret fields not present in
BaseSynthesisSessionConfig (the type the server validates the update
against). Carrying those keys through to the override, plus the shallow
.copy() + _safe_update_config nested-dict mutation, was the path that let
the voice_name override fail to land on published 26.02 NIMs.

Build the update payload explicitly from CLI args instead, so only fields
the user actually overrode reach the server, in the exact shape documented
in the SynthesisSessionUpdateMessage schema. Bump the override summary to
INFO so users can see which fields were sent. After the
synthesize_session.updated response, compare the server-applied voice_name
and language_code against what was requested and log a WARNING on
mismatch — defense-in-depth so any future server-side drop surfaces in the
client log instead of as a wrong-sounding audio file.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Guard TTS custom_configuration usage for backwards compatibility

Only import parse_custom_configuration and pass custom_configuration to
synthesize/synthesize_online when --custom-configuration is supplied,
so talk.py keeps working against older riva-client wheels that lack
the function and the kwarg.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Guard cli_main/EXIT_BAD_INPUT imports for backwards compatibility

cli_main and EXIT_BAD_INPUT were added recently in argparse_utils and
are not present in older riva-client wheels. Wrap their imports in a
try/except across all asr/nmt/tts client scripts, falling back to a
no-op decorator and EXIT_BAD_INPUT=2 so the scripts keep running
against older installed wheels (only the structured exit codes are
lost in that case).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: nmt: surface EXIT_BAD_INPUT when --text-file has no non-empty lines

* Default TTS sample rate to 22050 Hz to match HTTP API

Aligns scripts/tts/talk.py and riva.client.SpeechSynthesisService
synthesize/synthesize_online defaults with the HTTP /v1/audio/synthesize
default, so the same call over either transport yields the same audio
when the rate is left unspecified.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Addressing review comments

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Yuvaraj Dharavath <ydharavath@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants